To simplify visualization of various GenomeRunner results, a wrapper function has been created - mtx.sig <- showHeatmap. See the utils.R file for more details.
The enrichment p-values can be corrected for multiple testing.
The mtx.sig <- showHeatmap function returns filtered and clustered matrixes of the -log10-transformed p-values.
source("utils.R")
dirs <- list.dirs("data/", full.names = T, recursive = F) # Paths the SNP sets analyses
Input: an N x M enrichment matrix, where N are the regulatory datasets, M are the sets of features of interest (FOIs, e.g., SNP sets), and each cell represent the corresponding enrichment p-value.
Output: a heatmap of Pearson or Spearman correlation coefficients among the sets of FOIs (columns), and a numerical matrix of correlation coefficients used for the heatmap. The numerical matrix can be saved into a file.
mtx.sig <- showHeatmap(paste(dirs[2], "matrix.txt", sep = "/"), colnum = seq(1, 50), factor = "none", cell = "none", isLog10 = FALSE, adjust = "none", pval = 0.5, numtofilt = 1, toPlot = "corrSpearman")
There are two particularly interesting categories of regulatory datasets provided by the ENCODE project:
Information about the cell lines used in the ENCODE project can be found at the ENCODE cell types portal.
Input: an enrichment matrix. Specify which column and which category to use for visualization. Tweak the number of missing values (NAs) allowed in rows/columns - the rows/columns having more NAs will be filtered out.
Output: a heatmap of cell x regulatory mark enrichment results, and a numerical matrix of -log10-transformed p-valued used for the heatmap. The color key shows the range of the -log10-transformed p-values.
The numerical matrix can be saved into a file. The -log10-transformed p-values can be converted to regular p-value scale in Excel using ’=1/POWER(10, ABS(A1))*SIGN(A1)’ formula. Note a “-” sign indicates significant depletion instead of enrichment.
Instead of using all available cell lines, subsets of tissue-specific cell lines can be used. For example:
The ENCODE datasets are cell-type incomplete, that is, for one cell type the data about the distribution of all histone marks are available, but the other cell type may have only a few histone mark datasets. So, the matrixes of cells x marks will have many missing values. They have to be filtered to remove rows/columns containing too much missing values, otherwise, clustering and visualization algorithms break. The heatmaps of such filtered matrixes are shown.
mtx.sig <- showHeatmap(paste(dirs[1], "matrix.txt", sep = "/"), colnum = 62, factor = "Histone", cell = "none", isLog10 = FALSE, adjust = "none", pval = 0.1, numtofilt = 2, toPlot = "heat")
mtx.sig <- showHeatmap(paste(dirs[1], "matrix.txt", sep = "/"), colnum = 102, factor = "Histone", cell = "none", isLog10 = FALSE, adjust = "none", pval = 0.1, numtofilt = 4, toPlot = "heat")
mtx.sig <- showHeatmap(paste(dirs[1], "matrix.txt", sep = "/"), colnum = 102, factor = "Tfbs", cell = "none", isLog10 = FALSE, adjust = "none", pval = 1, numtofilt = 7, toPlot = "heat")
Each bar represents the -log10-transformed enrichment p-value - the higher the bar the more significant the enrichment is.
mtx.sig <- showHeatmap(paste(dirs[1], "matrix.txt", sep = "/"), colnum = 102, factor = "Histone", cell = "Monocd14ro1746", isLog10 = FALSE, adjust = "fdr", pval = 0.1, numtofilt = 7, toPlot = "barup")
One, or several comparisons can be plotted. Note, if two conditions are plotted, the barplot is split in two parts - one part shows the most significant enrichments for the first condition, while the other showls the most significant enrichments for the second condition.
Both overrepresented and underrepsesented barplots can be plotted.
mtx.sig <- showHeatmap(paste(dirs[1], "matrix.txt", sep = "/"), colnum = c(62, 102), factor = "Histone", cell = "Gm12878", isLog10 = FALSE, adjust = "fdr", pval = 0.1, numtofilt = 7, toPlot = "barup")
showHeatmap(paste(dirs[1], "matrix.txt", sep = "/"), colnum = c(62, 102, 135, 195), factor = "H3k4me1|H3k4me2", cell = "Gm12878", isLog10 = FALSE, adjust = "none", pval = 0.1, numtofilt = 1, toPlot = "lines")
Regualtory similarity analysis compares SNP set-specific regulatory enrichment profiles using Pearson or Spearman correlation coefficient.
mtx.sig <- showHeatmap(paste(dirs[1], "matrix.txt", sep = "/"), colnum = c(62, 102, 135, 195), factor = "Histone", cell = "Gm12878", isLog10 = FALSE, adjust = "none", pval = 0.1, numtofilt = 1, toPlot = "corrPearson")